238 research outputs found
Convex recovery of tensors using nuclear norm penalization
The subdifferential of convex functions of the singular spectrum of real
matrices has been widely studied in matrix analysis, optimization and automatic
control theory. Convex analysis and optimization over spaces of tensors is now
gaining much interest due to its potential applications to signal processing,
statistics and engineering. The goal of this paper is to present an
applications to the problem of low rank tensor recovery based on linear random
measurement by extending the results of Tropp to the tensors setting.Comment: To appear in proceedings LVA/ICA 2015 at Czech Republi
Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data
We present a Bayesian non-negative tensor factorization model for
count-valued tensor data, and develop scalable inference algorithms (both batch
and online) for dealing with massive tensors. Our generative model can handle
overdispersed counts as well as infer the rank of the decomposition. Moreover,
leveraging a reparameterization of the Poisson distribution as a multinomial
facilitates conjugacy in the model and enables simple and efficient Gibbs
sampling and variational Bayes (VB) inference updates, with a computational
cost that only depends on the number of nonzeros in the tensor. The model also
provides a nice interpretability for the factors; in our model, each factor
corresponds to a "topic". We develop a set of online inference algorithms that
allow further scaling up the model to massive tensors, for which batch
inference methods may be infeasible. We apply our framework on diverse
real-world applications, such as \emph{multiway} topic modeling on a scientific
publications database, analyzing a political science data set, and analyzing a
massive household transactions data set.Comment: ECML PKDD 201
Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition
Tensor decompositions are used in various data mining applications from
social network to medical applications and are extremely useful in discovering
latent structures or concepts in the data. Many real-world applications are
dynamic in nature and so are their data. To deal with this dynamic nature of
data, there exist a variety of online tensor decomposition algorithms. A
central assumption in all those algorithms is that the number of latent
concepts remains fixed throughout the entire stream. However, this need not be
the case. Every incoming batch in the stream may have a different number of
latent concepts, and the difference in latent concepts from one tensor batch to
another can provide insights into how our findings in a particular application
behave and deviate over time. In this paper, we define "concept" and "concept
drift" in the context of streaming tensor decomposition, as the manifestation
of the variability of latent concepts throughout the stream. Furthermore, we
introduce SeekAndDestroy, an algorithm that detects concept drift in streaming
tensor decomposition and is able to produce results robust to that drift. To
the best of our knowledge, this is the first work that investigates concept
drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy
on synthetic datasets, which exhibit a wide variety of realistic drift. Our
experiments demonstrate the effectiveness of SeekAndDestroy, both in the
detection of concept drift and in the alleviation of its effects, producing
results with similar quality to decomposing the entire tensor in one shot.
Additionally, in real datasets, SeekAndDestroy outperforms other streaming
baselines, while discovering novel useful components.Comment: 16 Pages, Accepted at ECML-PKDD 201
Tensor Product Approximation (DMRG) and Coupled Cluster method in Quantum Chemistry
We present the Copupled Cluster (CC) method and the Density matrix
Renormalization Grooup (DMRG) method in a unified way, from the perspective of
recent developments in tensor product approximation. We present an introduction
into recently developed hierarchical tensor representations, in particular
tensor trains which are matrix product states in physics language. The discrete
equations of full CI approximation applied to the electronic Schr\"odinger
equation is casted into a tensorial framework in form of the second
quantization. A further approximation is performed afterwards by tensor
approximation within a hierarchical format or equivalently a tree tensor
network. We establish the (differential) geometry of low rank hierarchical
tensors and apply the Driac Frenkel principle to reduce the original
high-dimensional problem to low dimensions. The DMRG algorithm is established
as an optimization method in this format with alternating directional search.
We briefly introduce the CC method and refer to our theoretical results. We
compare this approach in the present discrete formulation with the CC method
and its underlying exponential parametrization.Comment: 15 pages, 3 figure
Error Analysis of TT-Format Tensor Algorithms
The tensor train (TT) decomposition is a representation technique for arbitrary tensors, which allows efficient storage and computations. For a d-dimensional tensor with d 65 2, that decomposition consists of two ordinary matrices and d 12 2 third-order tensors. In this paper we prove that the TT decomposition of an arbitrary tensor can be computed (or approximated, for data compression purposes) by means of a backward stable algorithm based on computations with Householder matrices. Moreover, multilinear forms with tensors represented in TT format can be computed efficiently with a small backward error
An Iterative Model Reduction Scheme for Quadratic-Bilinear Descriptor Systems with an Application to Navier-Stokes Equations
We discuss model reduction for a particular class of quadratic-bilinear (QB)
descriptor systems. The main goal of this article is to extend the recently
studied interpolation-based optimal model reduction framework for QBODEs
[Benner et al. '16] to a class of descriptor systems in an efficient and
reliable way. Recently, it has been shown in the case of linear or bilinear
systems that a direct extension of interpolation-based model reduction
techniques to descriptor systems, without any modifications, may lead to poor
reduced-order systems. Therefore, for the analysis, we aim at transforming the
considered QB descriptor system into an equivalent QBODE system by means of
projectors for which standard model reduction techniques for QBODEs can be
employed, including aforementioned interpolation scheme. Subsequently, we
discuss related computational issues, thus resulting in a modified algorithm
that allows us to construct \emph{near}--optimal reduced-order systems without
explicitly computing the projectors used in the analysis. The efficiency of the
proposed algorithm is illustrated by means of a numerical example, obtained via
semi-discretization of the Navier-Stokes equations
Expert recommendation via tensor factorization with regularizing hierarchical topical relationships
© Springer Nature Switzerland AG 2018. Knowledge acquisition and exchange are generally crucial yet costly for both businesses and individuals, especially when the knowledge concerns various areas. Question Answering Communities offer an opportunity for sharing knowledge at a low cost, where communities users, many of whom are domain experts, can potentially provide high-quality solutions to a given problem. In this paper, we propose a framework for finding experts across multiple collaborative networks. We employ the recent techniques of tree-guided learning (via tensor decomposition), and matrix factorization to explore user expertise from past voted posts. Tensor decomposition enables to leverage the latent expertise of users, and the posts and related tags help identify the related areas. The final result is an expertise score for every user on every knowledge area. We experiment on Stack Exchange Networks, a set of question answering websites on different topics with a huge group of users and posts. Experiments show our proposed approach produces steady and premium outputs
Approximating turbulent and non-turbulent events with the Tensor Train decomposition method
Low-rank multilevel approximation methods are often suited to attack high-dimensional problems successfully and they allow very compact representation of large data sets. Specifically, hierarchical tensor product decomposition methods, e.g., the Tree-Tucker format and the Tensor Train format emerge as a promising approach for application to data that are concerned with cascade-of-scales problems as, e.g., in turbulent fluid dynamics. Beyond multilinear mathematics, those tensor formats are also successfully applied in e.g., physics or chemistry, where they are used in many body problems and quantum states. Here, we focus on two particular objectives, that is, we aim at capturing self-similar structures that might be hidden in the data and we present the reconstruction capabilities of the Tensor Train decomposition method tested with 3D channel turbulence flow data
WTEN: An advanced coupled tensor factorization strategy for learning from imbalanced data
© Springer International Publishing AG 2016. Learning from imbalanced and sparse data in multi-mode and high-dimensional tensor formats efficiently is a significant problem in data mining research. On one hand,Coupled Tensor Factorization (CTF) has become one of the most popular methods for joint analysis of heterogeneous sparse data generated from different sources. On the other hand,techniques such as sampling,cost-sensitive learning,etc. have been applied to many supervised learning models to handle imbalanced data. This research focuses on studying the effectiveness of combining advantages of both CTF and imbalanced data learning techniques for missing entry prediction,especially for entries with rare class labels. Importantly,we have also investigated the implication of joint analysis of the main tensor and extra information. One of our major goals is to design a robust weighting strategy for CTF to be able to not only effectively recover missing entries but also perform well when the entries are associated with imbalanced labels. Experiments on both real and synthetic datasets show that our approach outperforms existing CTF algorithms on imbalanced data
- …